Fix: Gemma3TextConfig rope scaling assignments#41934
Fix: Gemma3TextConfig rope scaling assignments#41934Rocketknight1 merged 2 commits intohuggingface:mainfrom
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Rocketknight1
left a comment
There was a problem hiding this comment.
I'm guessing the context for this is that rope_scaling clobbering rope_parameters (the old behaviour) gave the right config for older models but breaks on upcoming models that might have specific configs for e.g. sliding_attention?
Yes, we've observed that |
|
Got it, thanks for the fix! |
* Fix: Gemma3TextConfig rope scaling assignments * Fix: type annotation for rope_parameters
* Fix: Gemma3TextConfig rope scaling assignments * Fix: type annotation for rope_parameters
What does this PR do?
Related to #41922, this PR corrects the assignment of the
rope_scalingdictionary present on some Gemma 3 pre-trained models on HF Hub when normalizing to the newrope_parametersvalue.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@zucchini-nlp PTAL since you have been handling the RoPE changes.